Skip to content

HIVE-29578: Iceberg: add support for native views#6449

Open
difin wants to merge 1 commit into
apache:masterfrom
difin:iceberg_native_views
Open

HIVE-29578: Iceberg: add support for native views#6449
difin wants to merge 1 commit into
apache:masterfrom
difin:iceberg_native_views

Conversation

@difin
Copy link
Copy Markdown
Contributor

@difin difin commented Apr 23, 2026

What changes were proposed in this pull request?

Added support for Iceberg native views in Hive for both HMS and REST catalogs.

There is a limitation in the current implementation: when Hive uses a REST catalog and creates a view on a partitioned Iceberg table, querying the view only works with CBO disabled. To be addressed in a follow-up PR.

Why are the changes needed?

To support Iceberg native views. This can be especially useful for REST Catalog clients.

Does this PR introduce any user-facing change?

Yes, new HQL syntax:

create view <view_name> as select * from <src_tbl> stored by iceberg;

How was this patch tested?

Created new and updated exiting unit and integration tests with Iceberg native views test cases.

@difin difin force-pushed the iceberg_native_views branch from 4fdad42 to 252c608 Compare April 24, 2026 23:06
@difin difin force-pushed the iceberg_native_views branch from 252c608 to e10eba5 Compare April 24, 2026 23:31
@difin difin marked this pull request as ready for review April 24, 2026 23:31
@difin difin changed the title HIVE-29578: Iceberg: support for Iceberg native views HIVE-29578: Iceberg: support native views Apr 24, 2026
@difin difin changed the title HIVE-29578: Iceberg: support native views HIVE-29578: Iceberg: add support for native views Apr 24, 2026
@difin difin force-pushed the iceberg_native_views branch from e10eba5 to 96fa476 Compare April 25, 2026 20:46
@difin difin requested review from deniskuzZ and kasakrisz April 25, 2026 20:46
@difin difin force-pushed the iceberg_native_views branch from 96fa476 to 114412a Compare April 26, 2026 15:12

delete from src_ice where last_name in ('ln1a', 'ln2a', 'ln7a');

create view v_ice as select * from src_ice stored by iceberg;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO think the syntax should follow materialized view syntax

create materialized view mat1 stored by iceberg stored as orc tblproperties ('format-version'='1') as
select tbl_ice.b, tbl_ice.c from tbl_ice where tbl_ice.c > 52;

I checked some other database engines (Trino, Dremio) that supports Iceberg logical views, none of them adds extra keywords to the SQL syntax but they enable define the catalog where the view should be stored and that catalog should be Iceberg

Comment on lines +205 to +209
/**
* Optional trailing {@code tableFileFormat} on CREATE VIEW: only {@code STORED BY ICEBERG} is allowed
* (no serde properties or {@code STORED AS} tail).
*/
private boolean validateOptionalViewStorageClause(ASTNode storageRoot) throws SemanticException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keywords STORED BY ICEBERG are a bit confusing because no data is actually stored in the case of logical views. Some engines do not require extra keywords to specify when creating Iceberg logical views.

If we insist on using keywords, how about something like these?

create view <view_name> viewproperties(format='iceberg')
as select...;

create view <view_name> format iceberg
as select...;

If we decide to go with the STORED BY ICEBERG keywords, please create a new grammar rule specifically for views—similar to tableFileFormat—called viewMetadataFormat. This should limit the grammar to the STORED BY <identifier> syntax. By doing this, you can eliminate the need for extra validation checks in the analyzer.

I recommend checking the configuration setting hive.default.storage.handler.class when deciding where to store the view metadata. If a storage handler is set that supports views, let's use the Storage Handler API to store the metadata.

Copy link
Copy Markdown
Contributor Author

@difin difin May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Created a rule viewMetadataFormat.
  • Moved STORED BY ICEBERG closer to the view definition similar to CTAS. i.e.  `STORED BY ICEBERG as SELECT.
  • Made STORED BY ICEBERG optional. if not specified, deducting the type based on hive.default.storage.handler.class conf.

result.setLastAccessTime(nowSec);
result.setRetention(Integer.MAX_VALUE);

boolean hiveEngineEnabled = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is hiveEngineEnabled and why is it false?

Copy link
Copy Markdown
Contributor Author

@difin difin May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hiveEngineEnabled switches how HiveOperationsBase.storageDescriptor fills the Storage Desacriptor: with HiveIcebergInputFormat / HiveIcebergOutputFormat / HiveIcebergSerDe when true, or the usual placeholder FileInputFormat / FileOutputFormat / LazySimpleSerDe when false.

Why it’s false in toHiveView:

This path materializes an HMS VIRTUAL_VIEW for REST catalog that expose Iceberg view metadata through the HMS API. That row isn’t meant to drive a Hive table scan the way a real Iceberg table commit does; execution still comes from the view definition / catalog, not from wiring Iceberg MR formats on the stub. HiveViewOperations does the same thing (hiveEngineEnabled = false).

So we keep a minimal SD consistent with normal virtual views and avoid implying this HMS object is an Iceberg-backed table for the Hive engine. For tables, HiveTableOperations still turns engine integration on/off via metadata + ConfigProperties.ENGINE_HIVE_ENABLED where that actually matters.

private static ViewBuilder applyCommentAndTblProps(
ViewBuilder builder, Map<String, String> tblProperties, String comment) {
ViewBuilder viewBuilder = builder;
if (comment != null && !comment.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about isNotBlank ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return (ViewCatalog) catalog;
}

private static ViewBuilder startViewBuilder(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit.: startViewBuilder, applyCommentAndTblProps, commitView doesn't add much value when we already have a builder.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - replaced the too verbose methods with inline code.

Comment on lines +57 to +58
if (cat.viewExists(id)) {
cat.dropView(id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the result of dropView when the view with the specified name doesn't exists? I thinkg about whether the cat.viewExists(id) is necessary

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of dropView when the view with the specified name doesn't exists is false. Else true. I removed the cat.viewExists(id) check.

}

@Test
public void testIfNotExistsReturnsFalseWhenViewExists() throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the test method name testIfNotExistsReturnsFalseWhenViewExists is misleading? We are testing createOrReplaceNativeView not the IfNotExists method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the name was vague, IfNotExists is not a method, but one of the parameters.
I renamed the method to:
testCreateOrReplaceNativeViewSkipsWhenViewExistsAndIfNotExistsFlagTrue

create view v_ice as select * from src_ice stored by iceberg;

select * from v_ice;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add

  • logical view which does some transformation on it's base table and query from it?
  • create views when the schema is specified and not specified.

Copy link
Copy Markdown
Contributor Author

@difin difin May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logical view which does some transformation on it's base table and query from it?

This is not supported by Hive itself:

update v_ice set last_name = last_name + 'a' 
fname=iceberg_native_view.q

See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs.
 org.apache.hadoop.hive.ql.parse.SemanticException: You cannot update or delete records in a view
	at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.validateTargetTable(RewriteSemanticAnalyzer.java:265)
	at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyze(RewriteSemanticAnalyzer.java:84)
	at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:73)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:499)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:451)
	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:415)
	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:234)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358)
	at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:790)
	at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:760)
	at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
	at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:139)

create views when the schema is specified and not specified.

Done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logical view which does some transformation on it's base table

Sorry I mean something like

select first_name || last_name from ... where <some filter condition>

because

select * from table;

as a view definition is a kind of edge case. It is ok for testing but not a typical use-case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

break;
}
}
boolean icebergNativeView = validateOptionalViewStorageClause(storageClause);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not hardcode anything like Iceberg into compiler code. The compiler is independent from the storage handler. I'm aware that we already hove lots of code which violates this principal and it already causes lots of troubles.

Copy link
Copy Markdown
Contributor Author

@difin difin May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - moved all Iceberg-specific code into HiveIcebergStorageHandler and kept generic interfaces in the Compiler.

private static final long serialVersionUID = 1L;

/** HMS table property set when the view is declared with {@code STORED BY ICEBERG} (native Iceberg view). */
public static final String ICEBERG_NATIVE_VIEW_PROPERTY = "hive.iceberg.native.view";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this from here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

private final boolean ifNotExists;
private final boolean replace;
private final List<FieldSchema> partitionColumns;
private final boolean icebergNativeView;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this from here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +104 to +107
@Explain(displayName = "iceberg native view", displayOnlyOnTrue = true)
public boolean isIcebergNativeView() {
return icebergNativeView;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this from here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 9, 2026

Comment on lines +448 to +449
VIEW_STORAGE_HANDLER_UNSUPPORTED(10448, "CREATE VIEW only supports STORED BY ICEBERG for native "
+ "Iceberg views; unsupported storage clause: {0}", true),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rephrase this error message. Remove STORED BY and let Iceberg be a parameter.

Comment on lines +163 to +166
TableName.fromString(
view.name(), MetaStoreUtils.getDefaultCatalog(conf), Warehouse.DEFAULT_DATABASE_NAME);
result.setCatName(tableName.getCat());
result.setDbName(tableName.getDb());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when a customer db is specified?

create view my_db.myview as...

return conf;
}

private HiveCatalog verifyCatalog() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called verify? Based on the implementation of this method it is more like loading a catalog.
Could you please share some background of the use case of this method.

conf, DB, VIEW, cols, "select 2 as id", null, null, true, false))
.isTrue();

assertThat(verifyCatalog().viewExists(TableIdentifier.of(DB, VIEW))).isTrue();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it worth checking if the view definition is actually altered.

('fn7','ln7', 2);

----------------------------------------------------------------
-- Iceberg native view via TBLPROPERTIES before AS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does before AS adds any value in this comment? AFAIK the grammar allows this way only.

iceberg_create_locally_zordered_table.q,\
iceberg_merge_delete_files.q,\
iceberg_merge_files.q,\
iceberg_native_view.q,\
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any LLAP specific in view DDLs ? If not; can this test run by the default Iceberg driver?

Comment on lines +93 to +102
@Test
public void testParseCreateViewTblpropertiesViewFormatIceberg() throws Exception {
ASTNode tree = parseDriver.parse(
"create view v1 tblproperties ('view-format'='iceberg') as select * from t", null).getTree();
assertTrue(tree.dump(), tree.toStringTree().contains("tok_createview"));
assertTrue(tree.dump(), tree.toStringTree().contains("tok_tableproperties"));
assertTrue(tree.dump(), tree.toStringTree().contains("view-format"));
assertTrue(tree.dump(), tree.toStringTree().contains("iceberg"));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test class is about the default keyword testing. Could you please move this test to the one which is about view creation. Please create a new one of not exists. Since this grammar is not Iceberg specific it can be a generic one.

Comment on lines +248 to +251
if (explicitViewFormat) {
throw new SemanticException(ErrorMsg.VIEW_STORAGE_HANDLER_UNSUPPORTED.getMsg(
"Native view metadata is not supported for storage handler: " + handlerClass));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the knowledge about explicit view format at the caller.
Could you please move this check to the caller method.

Comment on lines +54 to +57
if (desc.usesNativeViewCatalog()) {
executeNativeCatalogView();
return 0;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this decision should be moved to BaseHiveIcebergMetaHook or HiveIcebergMetaHook.preCreateTable like we do in case of Iceberg tables.

Please set the StorageHandlerClass in CreateViewOperation.createViewObject to the newly created ql.metadata.Table object because later this object is passed to HMS and the meta hook.

    if (desc.usesNativeViewCatalog()) {
      storageFormat.setStorageHandler(desc.getNativeViewStorageHandlerClass());
      view.setProperty(
          org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_STORAGE,
          desc.getNativeViewStorageHandlerClass());
    }

I'm not a big fan of this meta hook solution but this is whet we have in case of Iceberg tables and IMHO it is better to be consistent.

Probably create or replace and if not exists doesn't have to be handled separately in case of Iceberg views of you make this change.

Please add some tests for create or replace and if not exists to the q test iceberg_native_view.q

}

/**
* Resolves {@code STORED BY identifier} for CREATE VIEW (short names such as {@code ICEBERG} or an FQCN).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was STORED BY left here intentionally?

* Keys should be removed when {@linkplain #clearNativeViewHmsTableProperties(Map)} is invoked for the same
* handler class recorded under {@link Constants.NATIVE_VIEW_STORAGE_HANDLER_CLASS_PARAM}.
*/
default Map<String, String> getNativeViewHmsTableProperties() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this simply getViewProperties? I would like to understand what native and hms mean in this context.

AFAIK, a native object is one that doesn't have a storage handler. To me, the word native in these method names is misleading. Could you please elaborate on this a bit?"

getNativeViewHmsTableProperties
clearNativeViewHmsTableProperties
createOrReplaceNativeView
supportsNativeViewCatalog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants